Sentiment Analysis using Linear Regression
نویسندگان
چکیده
In this assignment we learn a linear model for determining the rating of textual book reviews from amazon.com using linear regression. Despite its simplicity, the linear model still performs fairly well. However, a clever choice of the various “ingredients” of the model, such as features selection and regularization term, could further improve the its accuracy. In our work we study Unigram feature selection using (Boolean) Information Gain and we compare the performance of the resulting Boolean and Multinomial models. We observe that the importance of including multiple features is more significant than the difference between the Boolean and the Multinomial models. Another major consideration in such large data mining tasks is efficient representation and manipulation of the data. Implementing one’s own conjugate descent algorithm (a fascinating research area on its own right) is appealing as it could yield a very fast and efficient solver, however in favor of focusing on evaluating the model we chose to use the sparse data structures in Matlab, and we report some statistics we collected about their efficiency.
منابع مشابه
Forecasting Stock Prices using Sentiment Information in Annual Reports – A Neural Network and Support Vector Regression Approach
Stock price forecasting has been mostly realized using quantitative information. However, recent studies have demonstrated that sentiment information hidden in corporate annual reports can be successfully used to predict short-run stock price returns. Soft computing methods, like neural networks and support vector regression, have shown promising results in the forecasting of stock price due to...
متن کاملConsumer Review Analysis with Linear Regression
Sentiment analysis aims to classify people’s sentiments towards a particular subject based on their opinions. There are a number of different classification methods that can be used to perform this analysis. In this paper, we perform sentiment analysis on a labeled dataset of Amazon product reviews using linear regression. The dataset consists of one million textual reviews, of which approximat...
متن کاملSentiment Analysis on Financial News Headlines using Training Dataset Augmentation
This paper discusses the approach taken by the UWaterloo team to arrive at a solution for the Fine-Grained Sentiment Analysis problem posed by Task 5 of SemEval 2017. The paper describes the document vectorization and sentiment score prediction techniques used, as well as the design and implementation decisions taken while building the system for this task. The system uses text vectorization mo...
متن کاملDUTH at SemEval-2017 Task 5: Sentiment Predictability in Financial Microblogging and News Articles
We present the system developed by the team DUTH for the participation in Semeval-2017 task 5 Fine-Grained Sentiment Analysis on Financial Microblogs and News, in subtasks A and B. Our approach to determine the sentiment of Microblog Messages and News Statements & Headlines is based on linguistic preprocessing, feature engineering, and supervised machine learning techniques. To train our model,...
متن کاملLearning Dow Jones From Twitter Sentiment
In 2010, Bollen used Twitter data to find high predictability of Twitter sentiment on the stock market. [1]. We hypothesized that while Bollen’s results from analyzing the full breadth of the Twitter pipeline found significant results, fine-tuning the Twitter pipeline to only ‘high-impact’ financial tweets would improve the data signal and further improve results. As a result, we filtered a dat...
متن کامل